Are Thesauri Useful in Cross-Language Information Retrieval?

نویسندگان

  • Vivien Petras
  • Natalia Perelman
  • Fredric C. Gey
چکیده

Digital libraries relating to particular subject domains have invested a great deal of human e ort in developing metadata in the form of subject area thesauri. This e ort has emerged more recently in arti cial intelligence as ontologies or knowledge bases which organize particular subject areas. The purpose of subject area thesauri is to provide organization of the subject into logical, semantic divisions as well as to index document collections for e ective browsing and retrieval. Prior to free-text indexing (i.e. the bag-of-words approach to information retrieval), subject area thesauri provided the only point of entry (or 'entry vocabulary') to retrieve documents. A debate began over thirty years ago about the relative utility of the two approaches to retrieval:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Language Information Retrieval in a Multilingual Legal Domain

We describe here the application of a cross-language information retrieval technique based on similarity thesauri in the domain of Swiss law. We present the theory of similarity thesauri, which are information structures deerived from corpora, and show how they can be used for cross-language retrieval. We also discuss the collections of Swiss legal documents and show how we have used them to co...

متن کامل

Automatically-extracted Thesauri for Cross-language Ir: When Better Is Worse

A statistical algorithm for extracting bilingual term dictionaries (thesauri) from parallel text is presented, along with reenements for improving their size and accuracy. Somewhat paradoxically , increasing the accuracy of the extracted thesaurus can in fact reduce the performance of an IR system using it to perform query translation for cross-language information retrieval.

متن کامل

Similarity Thesauri and Cross-Language Retrieval

This paper describes a method for constructing a thesaurus automatically from a corpus of suitable documents, using standard information retrieval methods. The resulting thesauri can be used for user-initiated query expansion, automatic query expansion, as well as cross-language retrieval. Researchers at the Swiss Federal Institute of Technology in Zürich developed and evaluated this method in ...

متن کامل

Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval

OBJECTIVES We present in this article experiments on multi-language information extraction and access in the medical domain. For such applications, multilingual terminology plays a crucial role when working on specialized languages and specific domains. MATERIAL AND METHODS We propose firstly a method for enriching multilingual thesauri which extracts new terms from parallel corpora, and seco...

متن کامل

Thesaurus Mapping for Promoting Semantic Interoperability of European Public Services

Interoperability of eGovernment information systems is essential to provide advanced services to citizens. This work proposes a framework for implementing interoperability among thesauri for promoting cross-collection and cross-language information retrieval, as well as a specific approach within such framework on a case study aimed at mapping five thesauri of interest for the European Union in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002